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PREFACE 


This  research  describes  a  new  statistical  technique  developed  at 
Rand  to  assist  in  designing  an  efficient  experiment  associated  with 
recruiting  for  the  All  Volunteer  Force.  The  experiment  is  intended  to 
measure  the  effects  of  various  cash  bonus  option  incentives  as  induce¬ 
ments  to  "high-quality"  young  men  to  join  the  Army  in  hard-to-fill 
occupational  specialties.  This  Note  fully  describes  the  research 
related  to  this  new  experimental  design  technique,  and  can  therefore 
stand  alone.  Later  it  may  be  merged  with  one  or  more  other  Rand  Notes 
into  an  integrated  report  on  methodology  associated  with  the  Rand 
project  on  the  Enlistment  Bonus  Test. 


-  *  ---  * 


it 
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SUMMARY 


This  Note  describes  a  new  statistical  technique  for  comparing 
unbalanced  experimental  designs  which  will  be  modeled  by  the  univariate 
analysis  of  covariance.  We  propose  minimizing  a  design  criterion  variable 
called  PISE  (percent  inflation  of  the  standard  error  of  a  contrast). 

The  research  was  motivated  by  the  need  to  design  an  experiment  to 
measure  the  effectiveness  of  a  potential  new  Army  recruitment  policy. 

The  policy  would  provide  greater  management  flexibility  in  paying  cash 
bonuses  to  eligible  "high-quality"  young  men  who  agree  to  enlist  in  the 
U . S .  Army . 

We  provide  results  for  both  the  standard  Gauss-Markov  model  (con¬ 
stant  error  variance)  and  the  model  with  heteroscedasticity .  We  also 
discuss  the  problem  of  attributing  the  increased  variance  caused  by 
imbalance  in  a  design  to  particular  covariates. 

When  implemented,  the  proposed  PISE  criterion  will  generate  a 
design  which  has  greater  sensitivity  to  treatment  effect  differences. 
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I.  INTRODUCTION 


This  research  was  motivated  by  the  need  to  minimize  the  cost  and  to 
maximize  the  sensitivity  of  an  experiment  Rand  is  designing  to  test  a  new 
U.S.  Army  military  recruitment  policy.  Since  1973  the  Army  has  been  an 
all  volunteer  force,  and  the  Army  has  therefore  been  faced  with  the  problem 
of  how  to  induce  eligible  people  to  volunteer  for  service.  At  the  same 
time,  it  has  been  trying  to  ensure  that  an  appropriate  fraction  of  those 
who  volunteer  will  be  of  "high  quality,"  will  be  agreeable  to  being 
assigned  to  the  specialty  areas  of  interest  to  the  Army  at  the  current 
time,  and  will  be  agreeable  to  serving  for  a  reasonable  length  of  time. 

To  achieve  these  goals,  the  Army  has  been  studying  various  incentive 
systems  for  recruiting,  and  has  been  testing  these  systems  on  a  small 
scale  before  implementing  them  all  over  the  country. 

The  incentive  plan  which  stimulated  the  research  reported  on  here 
is  called  the  bonus  test.  An  Army  recruit  is  paid  a  given  dollar  bonus 
if  he  agrees  to  enlist  for  a  designated  number  of  years  (we  use  the 
term  "he"  since  women  have  been  excluded  from  the  test  because  of  their 
small  numbers) .  In  the  experiment  the  continental  United  States  is 
subdivided  into  three  (unequal)  parts.  One  part,  the  "control  cell," 
will  continue  to  employ  the  current  recruiting  incentive  system.  The 
other  two  parts  of  the  country,  called  "test  cells,"  will  each  employ  a 
different  type  of  bonus  system.  After  letting  the  experiment  run  for  a 
year  or  more,  we  will  compare  the  numbers  of  recruits  in  the  test  cells 
with  the  number  in  the  control  cell  to  try  to  determine  the  differential 
effects  of  the  different  bonus  plans. 

We  will  of  course  have  to  account,  and  adjust,  for  different  economic 
and  social  conditions  in  the  different  cells.  Differences  across  the 
cells  are  compared  statistically  with  "the  analysis  of  variance"  model. 

To  "correct"  the  results  of  such  a  model  for  differential  effects  across 
the  cells  that  are  not  bonus-induced  (treatment)  differences  involves  a 
slight  modification  of  the  model.  The  modified  model  is  called  "analysis 
of  covariance"  (ANOCOVA) .  The  correcting  variables  are  called  covariates. 
Ideally,  we  would  understand  enough  about  the  underlying  (recruitment) 
process  to  select  covariates  that  will  assign  most  of  the  variability 
in  the  dependent  variable  to  specific  causes,  so  that  the  residuals 
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in  the  model  are  only  randomly  associated  with  the  treatments  (see  Rubin, 
1974).  Usually,  we  do  not  have  complete  understanding,  and  we  try  to 
compensate  for  the  resulting  model  misspecification  error  through  a  pre¬ 
experiment  scheme  called  "balancing,"  discussed  below. 

In  designing  an  experiment  that  we  plan  to  analyze  using  ANOCOVA 
we  usually  try  to  assign  subjects  to  treatments  so  that  there  is  balance 
across  cells  (in  our  problem,  there  is  a  sample  mean  vector  of  eovariatos 
for  each  of  the  three  cells  and  we  try  to  effect  an  assignment  which  will 
equalize  the  three  mean  vectors).  The  reason  for  this  design  objective 
is  that  while  estimators  of  the  unknown  parameters  (and  contrasts)  remain 
unbiased  regardless  of  imbalance,  so  long  as  the  assumptions  of  the 
classical  Gauss-Markov  model  are  satisfied  (see  Haggstrom,  1975,  P-5449, 
p .  37) ,  the  variances  of  all  contrasts  are  inflated  when  there  is 
imbalance .  As  a  result,  a  greatly  unbalanced  design  can  make  it  very 
difficult  to  detect  meaningful  differences  across  test  cells  (since  the 
difference  in  effects  observed  in  the  experiment  might  be  attributed  to 
sampling  variation  instead  of  to  differences  caused  by  treatment  effects). 
In  experiments  in  which  the  covariates  may  be  readily  controlled  by  the 
experimenter,  such  as  those  that  typically  take  place  in  a  physical  science 
context,  it  is  often  not  too  difficult  to  achieve  the  objective  of  a 
balanced  design.  In  social  experimentation,  however,  some  of  the 
covariates  are  usually  not  very  easily  controlled,  so  that  it  is  rarely 
possible  to  achieve  perfect  balance. 

The  problem  of  balance  in  social  experiments  was  studied  by  Morris, 
1979,  in  the  context  of  a  social  experiment  on  health  insurance  designed 
at  The  Rand  Corporation.  His  procedure,  called  the  Finite  Selection 
Model,  is  designed  to  select  an  optimal  subset  of  subjects  from  a  finite 
population  available  for  potential  experimentation,  on  the  basis  of  the 
values  of  covariates  each  subject  is  likely  to  have  during  the  experiment. 
These  subjects  are  selected  one  at  a  time  by  his  model  using  a  nonlinear 
integer  programming  procedure  with  a  steepest  descents  algorithm.  The 
criterion  for  selection  of  subjects  involves  minimizing  a  weighted  sum 
of  variances  of  linear  functions  of  the  estimated  parameters  of  interest, 
subject  to  constraints.  This  minimization  is  carried  out  using  pre- 
experimental  data  comparable  to  the  anticipated  experimental  data.  The 
Morris  approach  follows  the  approach  used  earlier  by  Conlisk  and  Watts. 


1969,  although  the  latter  procedure  did  not  employ  pre-experimental 
data . 

The  solution  proposed  here  is  appropriate  for  quite  a  different 
context.  We  describe  in  this  Note  a  procedure  for  minimizing  imbalance 
in  experimental  design  in  a  context  in  which  all  subjects  are  potentially 
available  for  experimentation,  regardless  of  their  anticipated  covariate 
values.  Our  problem  is  how  to  allocate  all  available  subjects  to  treat¬ 
ments,  and  how  to  set  tolerances  on  the  differences  of  cell  means  of 
covariates,  so  that  the  design  is  almost  balanced.  The  criterion 
function  is  the  minimum  average  PISE,  or  percent  inflation  of  the 
standard  error  of  a  linear  function  of  the  parameters  of  interest 
(averaged  over  all  of  the  contrasts  of  interest) .  Whereas  both  the 
Morris  and  Press  procedures  for  balancing  include  both  first  and  second 
moments,  the  Press  procedure  compares  the  PISEs  for  various  possible 
configurations  of  subject-to-treatment  allocations  of  the  entire  popu¬ 
lation;  the  Morris  procedure  adds  subjects  to  the  design  one  by  one. 

So  the  implementing  algorithms  are  quite  different  (aside  from  the  fact 
that  one  criterion  uses  inflated  standard  deviation,  while  the  other 
uses  total  variance) .  The  Press  procedure  has  the  advantage  over  other 
procedures  of  providing  a  "natural  scale"  for  the  effects  of  imbalance. 
The  imbalance  scale  we  use  measures  the  percent  increase  of  the  standard 
errors  of  the  quantities  of  interest  over  what  the  standard  errors  would 
be  in  a  perfectly  balanced  situation. 

In  a  social  experiment  in  which  we  are  attempting  to  detect  an 
effect  attributable  to  some  treatment  (as  in  the  bonus  test),  we  often 
want  to  minimize  the  standard  errors  of  the  contrasts  of  interest  (in 
order  to  maximize  the  power  of  tests  of  significance  for  the  contrasts). 
Such  a  problem  involves  both  selection  of  appropriate  covariates  and 
balancing  across  cells.  Given  a  set  of  covariates,  the  balancing  problem 
reduces  to  minimizing  the  PISE. 

Using  the  PISE  criterion,  percent  inflation  of  standard  errors  for 
the  various  designs  considered  for  the  bonus  test  ranged  from  about 
3  percent,  for  the  best  design,  to  as  much  as  25  percent,  depending  upon 
which  design,  which  covariate  cell  mean  inequality  tolerances,  and  which 
contrasts  of  interest  were  selected.  Average  PISE  over  all  three  con¬ 
trasts  of  potential  interest  in  the  bonus  test  ran  as  high  as  18  percent 


for  some  designs  considered  (before  we  selected  the  best  one) . 

In  the  remainder  of  this  Note  we  develop  statistical  models  for 
evaluating  the  effects  of  imbalance  on  the  standard  errors  of  the 
contrasts  across  the  cells.  The  PISE  criterion  can  be  used  for  com¬ 
paring  potential  unbalanced  designs  on  the  basis  of  how  much  a  given 
design  inflates  the  standard  errors  of  the  contrasts  (and  we  propose 
selecting  that  design  which  effects  the  best  compromise  between  close 
balance  on  some  covariates  and  not  such  close  balance  on  others,  so 
that,  overall,  inflation  of  the  standard  error  is  minimized  for  the 
average  of  all  contrasts  of  interest) .  We  adopt  some  of  the  notation 
used  in  Haggstrom,  1975.  Our  main  result  is  given  in  Theorem  (1). 

Section  II  examines  the  problem  for  the  classical  Gauss-Markov 
model.  Section  III  provides  a  result  for  the  more  general  hetero- 
scedastic  model  (which  is  the  model  we  plan  to  use  for  the  bonus  test 
design) .  Section  IV  discusses  the  problems  associated  with  attempting 
to  attribute  the  difficulties  that  arise  in  connection  with  imbalance 
to  particular  covariates. 
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II.  INFLATION  OF  STANDARD  ERRORS  IN  CLASSICAL  ANOCOVA 


MODEL 


We  adopt  the  classical  ANOCOVA,  one  way  layout,  fixed  effects 

model  (for  given  x. .  and  z.), 
ij  1 


y.  =  £  8 .x. .  +  y  z .  +  e .  , 
l  .  ,  }  l]  i  i 

J  =  1  J 


(2.1) 


i  =  l,...,n,  where  y^  denotes  the  response  of  subject  i,  there  are  p 
groups  or  cells  with  n^.n^.-.-.n^  numbers  of  subjects  assigned  to  each 

of  the  p  cells,  n  =  In.  denotes  the  total  number  of  subjects,  x. .  is 

J  j  th 

one,  or  zero,  depending  upon  whether  or  not  the  i —  subject  is 

til 

assigned  to  the  j —  group,  z^  denotes  an  (h  x  1)  vector  of  covariates 
for  subject  i,  (6^ .  y)  are  unknown  coefficients  that  must  be  estimated 
from  the  data,  and  e^  denotes  mutually  uncorrelated  disturbance  terms 
with  means  zero  and  variances  not  depending  upon  i.  Prime  denotes  the 
transposed  vector. 

An  alternative  formulation  involves  writing 


6j  *  Tj  +  “• 


(2.2) 


where  a  denotes  the  ambient  effect,  or  the  effect  in  the  absence  of  any 
treatments,  and  t  denotes  the  effect  of  treatment  j  on  a  subject  in 
cell  j.  In  this  formulation. 


y.  =  I  (t.  +  a)x..  +  y  z.  +  e. 
1  j  =  i  J  1  1 


a  +  I  i .x. .  +  y  z .  +  e .  . 
j=l  J  1J  11 


(2.3) 


I« 


In  our  problem  of  bonus  test  recruiting,  the  subjects  in  the 
experiment  are  recruiting  centers  called  Armed  Forces  Entrance  and 
Examining  Stations  (AFEES) .  There  are  64  AFEES,  so  that  n  =  64. 

There  are  three  groups  (cells),  so  that  p  =  3.  The  continental 
United  States  is  subdivided  into  the  64  AFEES  named  and  delineated 
in  Fig.  1. 

The  balancing  problem  in  the  bonus  test  is  to  allocate  each  of  the 
64  AFEES  into  one  of  three  cells  (which  will  receive  three  distinct 
treatments,  that  is,  three  different  bonus  plans  will  be  used)  so  that 
the  mean  vectors  of  the  covariates  of  the  AFEES  in  the  three  cells  are 
equal.  We  will  see  below  that  the  effect  of  such  balancing  is  to  mini¬ 
mize  the  standard  deviations  of  all  estimated  contrasts  (comparisons  of 
treatment  effects) . 

We  now  reformulate  (2.1)  in  vector  terms,  to  simplify  the  algebra. 

Let  6  *  (8  )  ,  x  =  (x  )  ,  6  =  (e\  yV  , 

(p  x  1)  J  (P  x  1)  J  (p  +  h)  x  1 

wi  =  (x  »  z  )  .  Now,  (2.1)  becomes 

(p  +  h)  x  1  1  1 


yi  =  6  wi  +  ei  » 


(2.4) 


i  =  l,...,n.  If  we  let  e  *  (e  ) ,  and  write  the  model  as 

(n  x  1) 

n 

C  =  e  e  =  X  (y  -  $  w . )  ,  E(e)  =  0,  var  (e)  =  o^I  , 

i=l  1  1 


it  is  readily  seen  that  £  is  minimized  by  taking 


$  =  A  1g  , 


(2.5) 


?re,  for  A^:  (p  x  p)  ,  A22:  (h  x  h) 


£x.x! 

1  X 

Ex.  z! 

l  i 

v  * 

1  Z  .  X  . 

1  l 

Ezizi 

(2.6) 
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and  6  is  the  least  squares  estimator  of  6-  We  must,  of  course,  have 
A  >  0;  i.e.,  A  is  positive  definite  (and  symmetric).  It  is  also  easy 
to  find  (from  (2.5))  that  since 

var  (g)  =  o2A  , 


Define 


(2.8) 


Then, 


var  (6)  =  var I  I  =  o  B 


Thus,  using,  for  example,  Press,  1982,  equations  (2.6.3)  and  (2.6.4), 


var  (6)  *  0  Bu  -  o  (An-A12A22  A2l>  =  * 


(2.9) 


var  (y)  =  o2B22  =  a2(A22“A21All  ^12*  1  ”  Zy 


(2.10) 


We  also  nc^te  in  passing  that  cov  (g,y)  =  <j  B^2. 

Let  >p  denote  a  contrast  in  the  effects  of  the  experiment;  i.e., 

P 

p  =  c'B,  where  c  =  (c.)  ,  E  c  =  0.  For  example,  for  p  =  3,  if  we 

,  J  j=1 

take  c  =  (-1,1,0),  ip  =  -  B^-  If  8  denotes  the  least  squares  estimator 

of  B,  ip  =  c'g  denotes  the  least  squares  estimator  of  Thus, 


A  h  var  (ij>)  =  c'  [var  (R)]c  =  c'E*c  • 


(2.11) 
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We  next  evaluate  I*. 

6  „ 

From  (2.5)  and  (2.7),  A5  =  g  implies  A^B  +  A12Y  =  ®l* 


6  -  A1X  gl  Au  A12  y- 


(2.12) 


It  is  well  known  that  the  error  and  estimation  spaces  in  the  general 
linear  model  (under  the  Gauss-Markov  assumptions  that  we  have  made) 
are  orthogonal  (see,  e.g.  Scheffe,  1959,  p.  23).  So  y  and  g^  (depending 
only  on  the  y^'s,  or  the  errors,  for  given  w^'s)  are  uncorrelated. 
Therefore,  from  (2.12), 


Eg  *  var  (An  1g1)  +  var  ^j^A^y)  , 


or,  from  (2.11), 


A  =  c'Ajj^  X(var  g1)A11"1c  +  c' Au-1A12(var  y)A21Au  Xc  .  (2.13) 


From  (2.6)  and  (2.7),  note  that 


i  2  f  2 

var  (g^)  =  E  [var  (y±)  ]  =  o  Z  x^  =  o  A^  . 


(2.14) 


Substituting  (2.14)  into  (2.13)  gives 


A  =  o^c'A^  1c  +  c'A.^  ^A12(var  y)A2jA^  . 


(2.15) 


Now  note  that  since  x. .  denotes  an  indicator  variable,  x..  =  x. . , 

ij  ij  ij 

so  that  (using  (2.6)) 


Thus , 


'a  “I 
C  A11  C 


"l  ° 


0  n 


?p 


(2  16) 


(2.17) 


Substituting  (2.17)  into  (2.15)  gives 


P  hi 


2  r  1 

A  =  o  E  — +  C  , 


(2.18) 


where 


e  -  c  a11a12ZyA21AH  c  ' 


(2.19) 


A  similar  formulation  was  given  by  Harville  (1975),  p.  220. 
e  is  given  in  the  theorem  below.  First  define 


The  value  of 


Z 

(p  x  h) 


-  (z^ » • • • , Zp) 1  , 


<J> 


n  P _ 

I  z  z  . '  -  £  n.z.z.' 


(h  x  h)  i=l 


i  i 


(2.20) 

(2.21) 


j=l 


J  J  J 


=  l  (z.  -  z.)  (z  -  z.)  ’  , 

j-1  i-i  U  J  ^  J 


and 


zj 

(h  i  1) 


—  —  —  1 
-  (Zjl’*-**Zjh)'  ’  Zjk  ’ 


(2.22) 


Theorem  (1)  :  If  e  is  the  portion  of  the  variance  of  tp  defined  in  (2  19), 
e  is  given  by  the  quadratic  form 


e  =  o^c'Wc  -  a“c'(Z<t>  ^Z^c  . 


(2.23) 


Proof:  See  the  Appendix. 

Remark :  Note  that  <J>  is  the  matrix  of  sums  of  squares  and  crossproducts 
within  cells.  Therefore,  c  is  a  measure  of  imbalance  of  the  design  Z  as 
measured  in  its  natural  metric,  e  is  al.o  the  inflated  portion  of  var  (< 

Referring  back  to  the  variance  of  any  contrast,  given  in  (2.18), 
note  that  the  first  term  of  that  expression  is  strictly  positive. 
Therefore,  A  is  minimized  when  r  =  0  (since,  by  definition  in  (2.11),  A 
must  be  positive).  This  case  is  made  specific  in  the  corollary  below. 
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1) :  If  the  sample  means  of  the  covariate  vectors  are  equal, 

Z_  Zo  •  *  •  z  Z  } 

1  *  P 

e,  defined  in  (2.19),  vanishes. 

Remark  (1) :  This  well  known  result  (cf.  Haggstrom,  1975,  P-5449,  p.  37) 
occurs  when  the  design  is  balanced.  Thus,  when  the  design  is  balanced, 
W,  the  matrix  of  the  positive  semidefinite  quadratic  form  in  (2.23),  is 
not  of  full  rank. 


Remark  (2) :  Note  that  the  "balanced  design"  condition  of  Corollary  (1) 


is  only  a  sufficient  condition  for  e  =  0. 
would  also  make  e  *  0  (we  merely  need  z'c  = 


There  are 
P  _ 

1  c.z.  « 

1-  .1  j- 


the  combination  of  p  =  3,  c  =  (-1,1,0)',  and  z^  =  z^ 


other  conditions  that 
0;  so,  for  example, 

,  and  z^  =  anything 


would  also  yield  e  =  0) .  If,  however,  we  wanted  a  balanced  design 


regardless  of  which  contrast  was  of  interest,  i.e.,  we  wanted  e  *  0  for 
every  possible  contrast,  we  would  need  Z  *  (z,...,z)'. 


Proof  of  Corollary  (1):  Under  the  hypothesis  of  the  corollary, 

Z  =  (z,...,z)  , 

Substituting  into  (2.23)  gives  (since  ip  is  a  contrast) 

_  P 

Z'c  *  z  i  c  =  0  .  (2.24) 

i  =  l 

So  e  =  0,  and  A  is  minimized.  This  result  becomes  transparent  by  noting 
that  c'(Zi>  ^Z')c  in  (2.23)  does  not  change  if  we  subtract  a  constant 
vector  z  from  each  column  of  Z',  and  replace  it  by  Z'  =  (z_  -  z, _ z  -  z) . 

0  1  p 

INFLATION  OF  STANDARD  ERRORS 

Suppose  we  wish  to  design  an  experiment  in  which  we  know  we  will 
have  to  tolerate  some  degree  of  imbalance.  How  do  we  decide  among  various 
unbalanced  designs?  We  propose  below  a  natural  criterion  for  making  such 
a  decision,  in  light  of  Theorem  (1) • 
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Define  A  as  the  value  of  A  (defined  in  (2.11))  produced  in  the 
o 

case  of  a  balanced  design.  That  is. 


1  ci 

j-ih 


(2.25) 


Define  the  percent  inflation  of  the  standard  error  (PISE)  of  any 

t  —  _  , 

contrast  ip  =  c  6,  for  any  covariate  design  Z  =  (z,,...,z  )  ,  as 

1  P 


PIS':  = 


a-  /k; 


(100)  . 


(2.26) 


It  should  be  noted  that  PISE  in  Eq.  (2.26)  does  not  depend  upon  o  . 

Suppose  we  are  trying  to  compare  various  designs  which  have  the 
same  contrast  of  interest,  but  we  are  free  to  vary  Z  so  that  subjects 
can  be  allocated  to  different  treatments  in  various  ways.  If  we  can 
predetermine  Z  (at  least  approximately) ,  based  upon  external  data 
sources,  for  each  contending  design,  we  can  also  evaluate  PISE  for 
each  such  design.  We  might  then  select  that  design  possessing  the 
minimum  value  of  PISE.  Alternatively,  by  careful  study  of  PISE 
for  various  competing  designs,  we  might  decide  that  although  one 
design  had  a  higher  PISE  than  another,  the  difference  was  small 
enough  to  be  tolerable,  and  the  one  with  the  slightly  higher  PISE 
should  be  selected  in  light  of  other  economic,  political,  or  social 
factors;  by  examining  the  PISE  we  can  evaluate  the  "cost"  of  such 
a  tradeoff,  in  terms  of  effectiveness  of  the  experiment  (loss  of 
probability  of  finding  the  effect  for  which  we  are  designing). 

Suppose,  alternatively,  that  there  are  r  contrasts  of  interest,  and 
we  are  free  to  vary  Z  so  that  subjects  can  be  allocated  to  different 
treatments  in  various  ways.  We  assume  that  some  contrasts  may  be  more 
important  than  others.  Let  denote  the  weight  to  be  placed  on  the 
i —  contrast,  0  <  q^  <  1.  We  might  now  select  a  design  that  possesses 
the  minimum  (weighted)  average  value  of  PISE: 
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(2.27) 


A  A 

where  A  .  2  var  (ijO  =  var  (C|3)  ,  and  C^:  (p  x  1)  denotes  the  weight 
vector  for  the  i—  contrast.  This  type  of  averaging  is  in  the  spirit 
of  Cox,  1982,  p.  197. 

For  example,  suppose  there  are  three  cells  (two  test  cells  and  a 
control  cell)  and  two  simple  contrasts  of  interest,  namely 


*1  =  62  -  61’  *2  =  e3  "  B1  ' 


Then,  C,  =  (-1,  1,  0)'  and  C_  =  (-1,  0,  1)'.  Suppose  further  that 

1  ,  1 
and  ip2  are  of  equal  interest  and  importance,  so  that  q1  -  q2  =  J  - 

Then,  from  (2.27),  for  r  »  2,  if  A  *  A  =*  A  , 

B1  B2  a 

+  A~2  -  2^)  (100) 


a 


PISE  = 
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III.  INFLATION  OF  STANDARD  ERRORS  IN  WEIGHTED  ANOCOVA 


The  model  treated  in  Sec.  II  adopted  the  Gauss-Markov  assumptions 
E(e)  =  0,  E(eiej)  =0,  i  4  j,  and  var  (e^  =  a2:  i  =  l,...,n.  But 
suppose,  alternatively,  that 


var  (e^)  =  a^o  , 


(3.1) 


while  the  other  assumptions  remain  the  same.  That  is,  we  have  hetero- 
scedasticity .  Suppose  the  a^'s  are  known  constants,  however.  We  see 
below  that  the  results  of  Sec.  II  are  readily  extended  to  cover  this 
case  as  well.  (This  is  the  case  of  interest  in  the  bonus  experiment). 

Adopt  the  model  of  (2.1).  but  with  the  heteroscedasticity  assumption 
of  (3.1).  Define  the  transformed  variables: 

*  ,  .-1/2  *  .  .-1/2  *  .  ,-1/2 

yi  "  yi(aiJ  *  e4  =  »  z,-  = 


i  i  i 


i  “i'“i' 


x*.  "  x.  , (a  ) 


ij  ij  i 


(3.2) 


•k  k 

The  transformed  model  becomes  (for  given  and  z^) , 

*  P  *  ,  *  * 

+  Y  zi  +  ' 

for  i  =  1, ...  ,r. ,  with 


E(e*)  =  0,  E(eie*)  =0,  i  ^  j ,  var  (e^  =  o2  . 


(3.3) 


(3.4) 


The  model  now  has  the  general  form  assumed  in  Sec.  II,  in  terms  of 
transformed  variables.  So  the  corresponding  results  are  immediately 
obtainable.  The  basic  result  in  Theorem  (1)  becomes  (instead  of  (2.23)) 


*  2  ,  *  2  *  *-l  *i 

e  -  a  c'w  c  =  a  c  (Z  4>  AZ  )c. 


(3.5) 


where 


*  — *  — *  » 

Z  -  (z^ > • « • ,z^)  , 


(3.6) 


T* 


(h  x  1) 


(zJr. 


2^  )  z* 

’  jh;  ’  jk 


—  ^  z*  x* 

*  i=l  ik  ij  ’ 
n .  J 

J 


(3.7) 


n*  =  Z  (x*.2) 


i=l 


ij 


2 

n  x.  . 

=  z  (-r^-)  = 

i=l  i 


n  x.  . 

E  (-^)  , 
i=l  ai 


(3.5) 


and 


*  *  *  i  ”  *_*_* , 

$  =  I  z  z  -  En.z.z. 

i-i  11  HJjJ 


;3.9> 


Note  that  Eqs.  (3.7)  through  (3.9)  are  weighted  averages,  weighted  sums, 
or  weighted  sums  of  squares,  as  contrasted  with  their  unweighted  analogs 
in  Sec.  II. 

The  analog  of  (2.18)  becomes 


2  p 
A  =  O  Z 

j-1 


• 

V  3/ 


A  A  f  A 

with  t  defined  in  (3.5),  and  A  =  var  (ij>)  =  var  (c  6).  So  if 


* 


2  T 

a  Z 

j-1 


(3.11) 


denotes  the  variance  of  a  contrast  for  a  balanced  design  (one  with  e* 
the  selection  criterion  becomes,  for  the  case  of  heteroscedasticity , 


0) 


PISE 


^  -  K 
~*l  \ 


(100), 


(3.12) 


where  a  and  A„  are  defined  in  (3.10)  and  (3.11).  Note  again,  as  in  (2.26) 

2 

PISE  does  not  depend  upon  o  . 

To  apply  the  PISE  criterion  in  the  case  of  heteroscedasticity  we 
first  transform  the  covariate  vectors  to  form  Z *  and  4>*.  Then,  for  a 
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I 


l- 
I 


given  contrast  of  interest  (choice  of  c) ,  we  can  evaluate  PISE  for 
various  competing  designs. 

a2  is  usually  not  known,  but  in  some  situations  we  may  wish  to 

estimate  it.  For  example,  suppose  our  original  dependent  variables  are 

y  and  the  Y.'s  are  independently  distributed  Poisson  variates  with 

means  A..  If  we  transform  the  model  so  that  y.  -  log  (Y  +  1/2),  whe.e 
1  1 

y  is  given  by  (2.1),  it  may  be  shown  (see,  e.g.,  Cox,  1955)  that 

approximately , 


var  (y.)  =  var  [log  (Y.  +  1/2)]  ^  .  (3.13) 

1  i 

But  in  the  Poisson  distribution,  we  can  use  the  approximation 


I 
t 
i 


t 


Ai  =  E(Y.)  ^  Y.  . 


Substituting  in  (3.13)  gives: 

var  (y4)  = 

But  from  (3.1)  we  can  take 


i'  ~  Y.  ’ 

l 


.  .  /  N  -  2-1 

var  (y.)  =  var  (e  )  =  a.o  = 


So  we  can  take  a  e  1  in  this  instance,  but  take 


I. 

\ 

I. 


a .  = 


1 


(3.14) 


Thus,  we  adopt  the  model  in  (3.3),  with  a^  as  in  (3.14)  and  o  1.  We 
approximate  Y..,  prior  to  an  experiment,  by  using  previously  obtained  data 
comparable  to  Y^. 
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IV.  THE  PROBLEM  OF  ATTRIBUTING  INFLATED  STANDARD  ERRORS  TO  A  GIVEN  SOURCE 


Once  the  degree  of  inflation  of  the  standard  errors  of  contrasts 
caused  by  an  unbalanced  design  has  been  determined,  it  is  tempting  to  try 
to  separate  the  sources  of  the  inflation  by  attributing  these  sources  to  the 
distinct  covariates.  This  is  very  difficult  to  do,  except  in  unusual 
circumstances.  For  simplicity,  we  examine  the  nature  of  this  problem 
in  terms  of  variances  instead  of  standard  deviations,  and  exhibit  the 
behavior  of  the  inflation  explicitly  for  several  sharply  defined 
situations  relating  to  a  single  contrast.  Other  cases  are  more  complicated. 

The  proportion  of  a  contrast  variance  attributable  to  imbalance  of  the 
design  is  given  in  Theorem  (1),  (2.23),  for  the  classical  Gauss-Markov 
model  as 

/ 

e  =  o2c'(Z$-1Z')c  .  (41' 


That  is,  e  denotes  the  inflation  of  the  variance. 

Now  suppose  that  all  of  the  covariates  are*  mutually  uncorrelated,  so 
that  if  denotes  the  (sample)  variance  of  the  or^1  covariate. 


Then 


£ 


2  h  p  p 
a  I  Z  Z  c .  c . 
a-1  i=l  j=l  1  J 


z .  z . 
ia  ia 


no. 


h 

Z  f 
a=l 


where 


f 

a 


P 

Z 


J-l 


CiCj 


(4.2) 


Define 


P 

a 


(4.3) 


Note  that  is  the  proportion  of  the  inflated  variance  attributable  to 
the  ot~~  covariate. 

Consider  a  particular  (simple)  contrast. 


(n  x  1) 


(-1,1,0,...,0)’ 


In  this  case,  from  (4.2), 


Let  3^  =  (zia“  z2a )  denote  the  balancing  "tolerance"  for  covariate  a. 

That  is,  when  we  attempt  to  balance  the  design,  we  attempt  to  make  the 

sample  means  for  all  covariates  the  same  in  all  of  the  cells;  C  is 

a 

the  amount  by  which  the  cell  means  differ  for  cells  1  and  2,  for 
covariate  a.  Thus, 


f 

a 


2 

a  , 


and,  from  (4.3), 


P 

a 


2  2 

«T/c  ) 

a  a 

^  2  2 
Z  (CT/<J  ) 
,  a  a 


(4.4) 


Note  that  for  a  particular  a,  say  a  =  1,  we  may  express  (4.4)  in  the  form 


i 

(0^/oJ)  +  K  1  +  K(oJ/0^) 


(4.5) 


h  2  2 

where  K  =  Z  (0  /a  ).  Figure  2  shows  how  the  proportion  of  inflated 
a=2  a  a  6 

variance  attributable  to  the  first  covariate  varies  with  balancing  tolerance 
and  covariate  standard  deviation.  We  see  that  P^  increases  as  the  square  of 
balancing  tolerance,  so  that  the  further  apart  are  the  covariate  cell  means, 


[i 


Covariate  standard 
deviation 


Proportion  of  inflated  variance  attributable  to  balancing  tolerance 
and  covariate  standard  deviation 


the  greater  is  the  variance  inflation  (increasing  by  the  square  of  the 

tolerance).  Moreover,  P^  decreases  with  increasing  covariate  standard 

deviation,  so  is  greatest  when  the  first  covariate  is  not  free  to  vary 

much.  Thus,  referring  to  Eq.  (A. 2),  when  the  covariates  are  uncorrelated 

we  can  minimize  the  inflated  portion  of  the  total  variance  by  choosing 

2  2 

covariates  which  achieve  acceptably  low  values  of  (o  /o  ) .  Accordingly, 

a  2 

if  there  were  a  choice  between  two  covariates,  each  of  which  reduced  o 

2 

by  the  same  amount,  but  one  had  a  larger  o  ,  we  should  select  the  one 

2  a 
with  the  larger  o^. 

We  see  that  because  in  most  ANOCOVA  designs  the  covariates  are 
correlated,  we  cannot  break  out  the  effects  due  to  each  covariate 
separately.  The  greater  the  correlations  among  covariates  the  more 
difficult  it  will  be  to  analyze  the  individual  covariate  effects 
separately,  as  in  Fig.  2.  In  the  general  case,  therefore,  we  must  be 
content  to  evaluate  the  PISE  criterion  for  the  design  as  a  whole,  and 
then  to  make  PISE  as  small  as  possible. 


Appendix 


Proof  of  Theorem  (1) :  From  the  definition  of  e,  in  (2.19),  and  that  of 
in  (2.10),  we  have 

e  =  g2c'Ai:l  A12(A22  -  A21A11  A12)  Aj^j^  c  .  (A.l) 


From  (2.6), 


A10  =  E  x , z '  = 

12  i=l  i  i 

(p  x  h) 


nl  *1 


n  z 
P  P 


Using  (2.16)  and  (2.20),  we  find  the  representation 


A12  A11Z  ’  A21  *  Z  A11  * 


(A. 2) 


Substituting  (A. 2)  into  (A.l)  gives 


e  =  o2c'z(A22  -  Z,A11Z)~1z'c 


(A.  3) 


From  (2.6)  and  (2.21),  it  is  readily  seen  that 


<j>  =  a22  -  z'a^^z 


(A. 4) 


Substituting  (A. 4)  into  (A. 3)  gives  the  result  in  (2.23). 
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